90 research outputs found

    Global alignment of pairwise protein interaction networks for maximal common conserved patterns

    Get PDF
    A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis. Most of alignment tools focus on finding conserved interaction regions across the PPI networks through either local or global mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the size of true orthologs across species is small comparing to the total number of proteins in all species, we take a different approach based on a precompiled list of homologs identified by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster (fly), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identified in the alignment. The results are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specificity and sensitivity, and can be extended to multiple alignments easily

    On parameterized complexity of the Multi-MCS problem

    Get PDF
    AbstractWe introduce the maximum common subgraph problem for multiple graphs (Multi-MCS) inspired by various biological applications such as multiple alignments of gene sequences, protein structures, metabolic pathways, or protein–protein interaction networks. Multi-MCS is a generalization of the two-graph Maximum Common Subgraph problem (MCS). On the basis of the framework of parameterized complexity theory, we derive the parameterized complexity of Multi-MCS for various parameters for different classes of graphs. For example, for directed graphs with labeled vertices, we prove that the parameterized m-Multi-MCS problem is W[2]-hard, while the parameterized k-Multi-MCS problem is W[t]-hard (∀t≥1), where m and k are the size of the maximum common subgraph and the number of multiple graphs, respectively. We show similar results for other parameterized versions of the Multi-MCS problem for directed graphs with vertex labels and undirected graphs with vertex and edge labels by giving linear FPT reductions of the problems from parameterized versions of the longest common subsequence problem. Likewise, for unlabeled undirected graphs, we show that a parameterized version of the Multi-MCS problem with a fixed number of input graphs is W[1]-complete by showing a linear FPT reduction to and from a parameterized version of the maximum clique problem

    Learning Contextual Embeddings for Knowledge Graph Completion

    Get PDF
    Knowledge Graphs capture entities and their relationships. However, many knowledge graphs are afflicted by missing data. Recently, embedding methods have been used to alleviate this issue via knowledge graph completion. However, most existing methods only consider the relationship in triples, even though contextual relation types, consisting of the surrounding relation types of a triple, can substantially improve prediction accuracy. Therefore, we propose a contextual embedding method that learns the embeddings of entities and predicates while taking contextual relation types into account. The main benefits of our approach are: (1) improved scalability via a reduced number of epochs needed to achieve comparable or better results with the same memory complexity, (2) higher prediction accuracy (an average of 14%) compared to the related algorithms, and (3) high accuracy for both missing entity and predicate predictions. The source code and the YAGO43k dataset of this paper can be found from (https://github.ncsu.edu/cmoon2/kg)

    WebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 48-56. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    In-silico identification of phenotype-biased functional modules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phenotypes exhibited by microorganisms can be useful for several purposes, e.g., ethanol as an alternate fuel. Sometimes, the target phenotype maybe required in combination with other phenotypes, in order to be useful, for e.g., an industrial process may require that the organism survive in an anaerobic, alcohol rich environment and be able to feed on both hexose and pentose sugars to produce ethanol. This combination of traits may not be available in any existing organism or if they do exist, the mechanisms involved in the phenotype-expression may not be efficient enough to be useful. Thus, it may be required to genetically modify microorganisms. However, before any genetic modification can take place, it is important to identify the underlying cellular subsystems responsible for the expression of the target phenotype.</p> <p>Results</p> <p>In this paper, we develop a method to identify statistically significant and phenotypically-biased functional modules. The method can compare the organismal network information from hundreds of phenotype expressing and phenotype non-expressing organisms to identify cellular subsystems that are more prone to occur in phenotype-expressing organisms than in phenotype non-expressing organisms. We have provided literature evidence that the phenotype-biased modules identified for phenotypes such as hydrogen production (dark and light fermentation), respiration, gram-positive, gram-negative and motility, are indeed phenotype-related.</p> <p>Conclusion</p> <p>Thus we have proposed a methodology to identify phenotype-biased cellular subsystems. We have shown the effectiveness of our methodology by applying it to several target phenotypes. The code and all supplemental files can be downloaded from (<url>http://freescience.org/cs/phenotype-biased-biclusters/</url>).</p

    DENSE: efficient and prior knowledge-driven discovery of phenotype-associated protein functional modules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying cellular subsystems that are involved in the expression of a target phenotype has been a very active research area for the past several years. In this paper, <it>cellular subsystem </it>refers to a group of genes (or proteins) that interact and carry out a common function in the cell. Most studies identify genes associated with a phenotype on the basis of some statistical bias, others have extended these statistical methods to analyze functional modules and biological pathways for phenotype-relatedness. However, a biologist might often have a specific question in mind while performing such analysis and most of the resulting subsystems obtained by the existing methods might be largely irrelevant to the question in hand. Arguably, it would be valuable to incorporate biologist's knowledge about the phenotype into the algorithm. This way, it is anticipated that the resulting subsytems would not only be related to the target phenotype but also contain information that the biologist is likely to be interested in.</p> <p>Results</p> <p>In this paper we introduce a fast and theoretically guranteed method called <it>DENSE </it>(Dense and ENriched Subgraph Enumeration) that can take in as input a biologist's <it>prior </it>knowledge as a set of query proteins and identify all the dense functional modules in a biological network that contain some part of the query vertices. The density (in terms of the number of network egdes) and the enrichment (the number of query proteins in the resulting functional module) can be manipulated via two parameters γ and <it>μ</it>, respectively.</p> <p>Conclusion</p> <p>This algorithm has been applied to the protein functional association network of <it>Clostridium acetobutylicum </it>ATCC 824, a hydrogen producing, acid-tolerant organism. The algorithm was able to verify relationships known to exist in literature and also some previously unknown relationships including those with regulatory and signaling functions. Additionally, we were also able to hypothesize that some uncharacterized proteins are likely associated with the target phenotype. The DENSE code can be downloaded from <url>http://www.freescience.org/cs/DENSE/</url></p

    Impact of Pretreated Switchgrass and Biomass Carbohydrates on Clostridium thermocellum ATCC 27405 Cellulosome Composition: A Quantitative Proteomic Analysis

    Get PDF
    Background: Economic feasibility and sustainability of lignocellulosic ethanol production requires the development of robust microorganisms that can efficiently degrade and convert plant biomass to ethanol. The anaerobic thermophilic bacterium Clostridium thermocellum is a candidate microorganism as it is capable of hydrolyzing cellulose and fermenting the hydrolysis products to ethanol and other metabolites. C. thermocellum achieves efficient cellulose hydrolysis using multiprotein extracellular enzymatic complexes, termed cellulosomes. Methodology/Principal Findings: In this study, we used quantitative proteomics (multidimensional LC-MS/MS and 15N-metabolic labeling) to measure relative changes in levels of cellulosomal subunit proteins (per CipA scaffoldin basis) when C. thermocellum ATCC 27405 was grown on a variety of carbon sources [dilute-acid pretreated switchgrass, cellobiose, amorphous cellulose, crystalline cellulose (Avicel) and combinations of crystalline cellulose with pectin or xylan or both]. Cellulosome samples isolated from cultures grown on these carbon sources were compared to 15N labeled cellulosome samples isolated from crystalline cellulose-grown cultures. In total from all samples, proteomic analysis identified 59 dockerin- and 8 cohesin-module containing components, including 16 previously undetected cellulosomal subunits. Many cellulosomal components showed differential protein abundance in the presence of non-cellulose substrates in the growt
    corecore